Context reinforced neural topic modeling over short texts
نویسندگان
چکیده
As one of the prevalent topic mining methods, neural modeling has attracted a lot interests due to advantages low training costs and strong generalisation abilities. However, existing models may suffer from feature sparsity problem when applied short texts, lack context in each message. To alleviate this issue, we propose Context Reinforced Neural Topic Model (CRNTM), whose characteristics can be summarized as follows. First, by assuming that text covers only few salient topics, proposed CRNTM infers for word narrow range. Second, our model exploits pre-trained embeddings treating topics multivariate Gaussian distributions or mixture embedding space. Extensive experiments on two benchmark corpora validate effectiveness both discovery classification.
منابع مشابه
Topic Modeling over Short Texts by Incorporating Word Embeddings
Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks, such as content charactering, user interest profiling, and emerging topic detecting. Existing methods such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) cannot solve this problem very well since only very limited word co-o...
متن کاملEnhancing Topic Modeling on Short Texts with Crowdsourcing
Topic modeling is nowadays widely used in text archive analytics, to find significant topics in news articles and important aspects of product comments available on the Internet. While statistical approaches, e.g. Latent Dirichlet Allocation (LDA) and its variants, are effective on building topic models on long texts, it remains difficult to identify meaningful topics over short texts, e.g. new...
متن کاملTopic Segmentation for Short Texts
Topic segmentation, which aims to fmd the boundaries between topic blocks in a text, is an important task for semantic analysis of texts. Although different solutions have been proposed for the task, many limitations and difficulties exist in the approaches. In particular most of the methods do not work well for such case as short texts, internet news and student's writings. In this paper, we f...
متن کاملExploring Social Context for Topic Identification in Short and Noisy Texts
With the pervasion of social media, topic identification in short texts attracts increasing attention in recent years. However, in nature the texts of social media are short and noisy, and the structures are sparse and dynamic, resulting in difficulty to identify topic categories exactly from online social media. Inspired by social science findings that preference consistency and social contagi...
متن کاملUnsupervised Topic Modeling for Short Texts Using Distributed Representations of Words
We present an unsupervised topic model for short texts that performs soft clustering over distributed representations of words. We model the low-dimensional semantic vector space represented by the dense distributed representations of words using Gaussian mixture models (GMMs) whose components capture the notion of latent topics. While conventional topic modeling schemes such as probabilistic l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Sciences
سال: 2022
ISSN: ['0020-0255', '1872-6291']
DOI: https://doi.org/10.1016/j.ins.2022.05.098